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Key Points 

• Seasonal prediction skill of the Arctic Oscillation in boreal winter 

• Prediction skill change depending on period 


Abstract 

This study assesses the prediction skill of the boreal winter Arctic Oscillation (AO) in the 
state-of-the-art dynamical ensemble prediction systems (EPSs): the UKMO GloSea4, the 
NCEP CFSv2, and the NASA GEOS-5. Long-term reforecasts made with the EPSs are used 
to evaluate representations of the AO, and to examine skill scores for the deterministic and 
probabilistic forecast of the AO index. The reforecasts reproduce the observed changes in the 
large-scale patterns of the Northern Hemispheric surface temperature, upper-level wind, and 
precipitation according to the AO phase. Results demonstrate that all EPSs have better 
prediction skill than the persistence prediction for lead times up to 3-month, suggesting a 
great potential for skillful prediction of the AO and the associated climate anomalies in 
seasonal time scale. It is also found that the deterministic and probabilistic forecast skill of 
the AO in the recent period (1997-2010) is higher than that in the earlier period (1983-1996). 


Index Terms and Keywords 


Climate variability; Coupled models of the climate system 
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1. Introduction 


The Arctic Oscillation (AO, Thompson and Wallace [1998]), which is characterized by a 
periodic exchange of the atmospheric mass field between the Arctic and the rest of high 
latitudes, is an important mode of climate variability in the Northern Hemisphere. When the 
Arctic region has anomalously higher atmospheric mass - the negative phase of the AO, the 
circumpolar jet stream weakens and shifts southward, causing abnormally severe winters in 
the mid-latitude [Thompson and Wallace, 2000; Higgins et a/., 2002; Wettstein and Mearns, 
2002]. Regarding its profound impacts on winter climate over the Northern Hemispheric mid- 
and high-latitude areas, the accuracy of the seasonal prediction over these regions seems to be 
tied strongly with our ability to predict the AO. This calls for a systematic assessment of 
prediction skill of the AO using forecasts made with operational forecast systems. 

While the nature of the AO and the physical mechanisms under the phenomenon have 
been extensively studied [ Limpasuvan and Hartmann, 2000; Lorenz and Hartmann, 2003; 
Polvani and Waugh, 2004; Cohen et ah, 2010; Kim and Ahn, 2012, among many others], 
studies focusing on the seasonal predictability or the prediction skill of the AO are 
surprisingly rare in the literature. To our knowledge, only one study examined prediction skill 
of the AO exclusively [Riddle et ah, 2013], although Arribas et al. [2011] and Kim et al. 
[2012] assessed forecast skill of the North Atlantic Oscillation (NAO) as one of climate 
variability investigated. In Riddle et al. [2013], it is found that the National Centers for 
Environmental Prediction (NCEP) coupled forecast system model version 2 (CFSv2, [Saha et 
al. 2013]) is capable to forecast the wintertime AO up to forecast lead time more than 2 
months. They suggested the hardly resolved process in the model associated with the 
stratospheric pathway of atmosphere related to the propagation linked to October Eurasian 


snow cover. 
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Motivated from the above, this study evaluates the AO prediction performance for three 
state-of-the-art seasonal forecasting systems, the UK Met Office Global Seasonal forecasting 
system version 4 (GloSea4) [Arribas et al, 2011], the NCEP CFSv2, and the National 
Aeronautics and Space Administration (NASA) Goddard Earth Observing System Model, 
Version 5 (GEOS-5) AOGCM [Rienecker et al. 2011], These systems have been developed 
independently with quite different model formulations and initialization processes. By 
carefully examining multi-decadal reforecasts produced with these forecasting systems, we 
aim at quantifying the current level of AO prediction skill in modern seasonal forecast 
systems, and at identifying the differences in skill that are presumably due to the differences 
in model formulation and the initialization processes. 

Section 2 describes data and methodology used in this study. Prediction skill of the AO 
in the three reforecast datasets will be presented in Section 3. Summary and conclusions are 
given in Section 4. 


2. Data and Methodology 

The following data were used in this research: the reforecasts from GloSea4 (1996 — 
2009), from CFSv2 (1982-2010) and from GEOS-5 (1981-2012). The detailed descriptions 
of each reforecasts are given in Table 1. Three ensemble members of GloSea4, perturbed by 
stochastic physics, are initiated at fixed calendar dates of each month, and integrated for 7 
months. The reforecasts of CFSv2 are initialized every 5 days (from all 4 cycles of the day) 
beginning with Jan 1st of each year by using 9-hour coupled guess field. The GEOS-5 
seasonal forecasts consist of a single ensemble member initialized every 5 days and 


additional ensemble members, generated through coupled model breeding and independent 
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perturbations in the atmosphere and ocean, produced in day closest to the begimiing of the 
month. 

For this study, only ensemble members that were initialized in November and first 
available day in December were used to evaluate the prediction skill of the boreal winter AO. 
Note that the number of ensemble members is different for the different systems (Table 1). 
The used ensemble members are 15 for GloSea4, 28 for CFSv2, and 19 for GEOS-5. 

For verification, we used the Modem Era Retrospective-Analysis for Research and 
Applications (MERRA, [ Rienecker et al. 2011]) atmospheric reanalysis. MERRA has a 
spatial resolution of 1/2 (latitude) x 2/3° (longitude), with 72 vertical levels. We note that 
our results are not dependent on the choice of reanalysis. Almost identical results for the AO 
index derived from an empirical orthogonal function (EOF) analysis using sea level pressure 
(SLP) are obtained using ERA-Interim (the correlation coefficient of DJF AO index between 
ERA-Interim and MERRA is larger than 0.99). Additionally, data from Global Precipitation 
Climatology Project (GPCP, [Adler et al., 2003]) are used to validate precipitation from the 
models. 

To obtain characteristic pattern and time variation of the observed AO, the EOF analysis 
was performed with seasonal-mean (DJF), Northern hemispheric (north of 20°N) sea level 
pressure data from MERRA. The resulting first EOF represents the AO mode and the PC 
time series associated with the first EOF exhibit interannual variation of the AO mode. The 
three reforecast datasets are evaluated with respect to i) the fidelity to reproduce the observed 
pattern of the AO, and ii) the capability to forecast the observed interannual variation of the 
AO. 


In order to evaluate the AO patterns reproduced by the prediction systems, the same EOF 
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analysis was applied to each ensemble member 1 . After obtaining the AO mode (i.e. 1 st or 2 nd 
EOF) from each ensemble member, we took an ensemble average of the AO patterns, after 
multiplying standard deviations of their PCs. When we compared these AO pattern from the 
reforecast datasets, we multiplied standard deviation of first PC to the observed AO pattern. 
Anomalous pattern of other variables associated with the AO were obtained by regressing the 
variables onto the PC time series of the AO mode for each ensemble member, and then 
averaging the regressed patterns over the ensemble. 

To assess the prediction skill of the AO using the reforecast dataset, either seasonal or 
monthly averaged forecasted SLP anomaly was projected onto the observed AO pattern. The 
resulting time series, after normalized by its own standard deviation, is then used for the 
forecast skill assessment. Temporal correlation coefficient between the observed and 
forecasted AO indices represents the prediction skill in this study. The forecasted AO indices 
were obtained by averaging the normalized time series from each ensemble member, and we 
tried two ways of ensemble averaging. The first one is a simple averaging, in which all 
ensemble members have equal weighting. The second way bases on an argument that 
ensemble members whose initialization time is closer to target season should have bigger 
weightings. In this method, we set an arbitrary weighting (100) to the ensemble member 
whose initialization time is closest to the target season (Dec. 2 nd ), and reduced the weighting 
as the initialization time becomes earlier (2 per day). Because the results from both methods 
showed similar forecast skill (not shown), we here present only the results obtained with the 
second averaging method. The persistent forecast provides a baseline forecast, and we 
consider a prediction skill useful only when it exceeds that of the persistent forecast. 

1 In most cases, an AO-like pattern emerged as the first EOF. In some cases the second mode 
was used. This was done if the pattern correlation between the second EOF and the AO 
pattern from MERRA is higher than that of the leading EOF (this never occurred for GloSea4, 
it occurred once for GEOS-5, and it occurred six times for CFSv2) 
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The Relative Operating Characteristic score (ROC, [Mason, 1982]) is used as a skill 


metric for probabilistic forecast of the AO index. The ROC scores for the upper tercile (i.e. 
positive AO) and lower tercile (i.e. negative) were evaluated with probability thresholds 
ranging from 0% to 100% with a 20% interval. In general, the ROC score above 0.5 indicates 
skill better than climatology. As far as we are aware, this is the first assessment of 
probabilistic forecast skill of the AO using the coupled seasonal forecast. On the other hand, 
the probabilistic forecast skill of the NAO was studied using the ECMWF system 2 [ Muller 
et al., 2005]. 


3. AO Prediction 

Figure 1 compares the AO SLP patterns represented in the three prediction systems to 
that obtained from MERRA. MERRA shows a zonally symmetric pattern with clear opposite 
signed anomalies between the Arctic and the mid-latitude oceans (North Pacific Ocean and 
North Atlantic Ocean). All prediction systems are able to reproduce this pattern fairly well, 
exhibiting action centers close to that of MERRA. The pattern correlations between MERRA 
and each forecast have comparable values ranging between 0.86 and 0.90. The prediction 
systems, however, commonly underestimate amplitude of the peaks, especially over the 
North Atlantic and the Kara Sea. Compared to other prediction systems, GEOS-5 exhibits 
more realistic SLP anomaly pattern over the Kara Sea and the northern Siberia. The AO 
mode explains about 37 and 39% of total interannual variability in GEOS-5 and GloSea4, 
respectively, which is close to the observed value (41%). The percentage variance explained 
by the AO mode from CFSv2 is somewhat lower than that of others; this might be due to the 
greater frequency of mixing the AO signal with the 2 nd EOF mode. 
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Spatial patterns of surface temperature, 200 hPa zonal wind and precipitation anomalies 
associated with the AO mode from each reforecast are shown in Figure 2. The north-south 
oriented patterns of anomalous surface temperature are represented over Eurasia and North 
America in MERRA (Figure 2a). This surface temperature anomaly pattern is reasonably 
reproduced in the reforecasts over land (Figures 2b-d), although its amplitude is 
underestimated. The amplitude of the temperature variability over Siberia is more realistic in 
GEOS-5 than those of the other systems, and this might be linked to the more realistic 
pressure pattern over Siberia and the Kara Sea (Figure Id). The upper level zonal wind 
pattern from the forecast systems is consistent with that of MERRA with high statistical 
significance, describing a realistic modulation the jet stream corresponding to the phase of the 
AO (Figs. 2e-h). Nevertheless, there are system-dependent biases such as shifts in the centers 
of variability that correspond to biases in the SLP variability. For example, variability center 
of GloSea4 and GEOS-5 shifted to westward in the North Pacific Ocean. Consistent to the jet 
stream shift, the precipitation is enhanced in high-latitudes positive phase of the AO, but the 
amplitudes of the forecasts are lower than observation. The forecast systems commonly fail 
to capture the precipitation anomaly in the East Asia (Figs. 2i-l). 

Above results demonstrate that the prediction systems are able to reproduce the observed 
AO pattern at least to some extent. From now on, we focus on the prediction skill. Note that, 
as described in Section 2, we use a single AO pattern obtained from MERRA, not each 
system’s own one, for this purpose. The time series of the recent AO index (1997-2010) from 
MERRA and reforecasts are shown in Figure 3a. The reforecasts show a reasonable 
prediction of the seasonal mean AO index. This includes the anomalously negative value in 
2010, although GloSea4 and GEOS-5 underestimate the intensity of negative anomaly. 
Ensembles of the three prediction systems commonly show a large spread, though they tend 
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to show relatively small spread in several years. Table 2 shows the correlation coefficients 
between the AO index of MERRA and of each reforecast. Note that CFSv2 and GEOS-5 
show much higher correlations for recent period (1997-2010) compared to those for earlier 
period (1983-1996). Similar to the skill of the deterministic forecasts of the AO index, the 
skill of probabilistic forecast also show substantial score changes between the two periods 
(Figure 4). Each reforecast shows marginal prediction skill for both positive and negative 
phases of the AO for 1997-2010 (all of ROC scores exceed 0.6), while the ROC scores for 
1983-1996 (lower than 0.5 in case of upper tercile) are lower than those for the recent 14 
years. 

Figures 3b-d show month-to-month temporal correlation coefficients for December- 
March along with corresponding results with the persistence forecast. Forecasts initialized in 
November show higher temporal correlation coefficients in winter than persistent for 1997- 
2010, while the skill of dynamical predictions do not consistently exceed that of persistence 
forecast after February. The prediction skill for 1983-1996 is comparable to persistence after 
December consistent with lower seasonal mean prediction skills during early period (1983- 
1996) indicated in Table 2. The reason for the lower prediction skill of GloSea4 in January 
and February is not clear, but it seems to be related to the model bias or influenced by 
relatively small number of ensemble member. The GloSea4 shows higher prediction skill in 
case of using forecast-driven EOF to derive AO index (r = 0.54 for DJF-mean compared to 
0.42 in Table 2), which implies model bias of the EOF pattern obscured the prediction skill of 
the AO. 


4. Conclusion 
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This study examined the skill of AO predictions using reforecast datasets made with 
three state-of-the-art coupled ensemble prediction systems. The study in particularly focused 
on wintertime AO predictions using a set of reforecasts initialized around November over 
multiple years. The three prediction systems all include interactive land, ocean and sea ice 
components coupled with the atmosphere, although the details of the formulations and the 
initialization processes are substantially different among the systems. Our results show that 
the seasonal forecast systems exhibit significant skill at predicting the AO up to 3 months of 
forecast lead time for recent 14 years. This suggests that useful AO predictions could be 
issued in November for the following winter. 

Our results highlight two aspects of the AO prediction problem. First of all, seasonal 
prediction systems are able to reproduce the basic AO phenomenon itself, with high pattern 
correlations in SLP ranging from 0.86 to 0.90. The forecast systems also demonstrate realistic 
patterns of anomalous surface temperature, upper-level wind, and precipitation that are 
associated with the AO, implying that those systems are able to resolve the key physical and 
dynamical processes accompanied by the AO. Secondly, the seasonal prediction systems 
have capability to forecast year-to-year variations of the AO, including the recent extreme 
occurrences of the AO. The prediction skill does differ among the three systems, and this 
likely reflects differences in the parameterizations and initialization processes of each system. 
There is considerable spread among the ensemble members, suggesting the possibility of 
future improvements in AO predictions. 

The prediction skills for 1997-2010 were higher than the previous 14 years for both the 
deterministic and probabilistic predictions. Riddle et al. [2013], who found this change earlier 
from CFSv2 reforecasts, speculated that the difference was caused by systematic errors and 
bias associated with the initialization prior to 1998. Flowever, we camiot exclude other 
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possibilities (e.g., a mean state shift favoring greater predictability of the AO during the 
recent period). For example, Li et al. [2013] suggested a strengthening in the relationship 
between the AO and the El Nino-Southern Oscillation (ENSO) after the mid-1990s, with 
possible links to interannual variability of sea ice. The correlation coefficient between DJF- 
mean AO index in this study and the Oceanic Nino Index of NOAA from the website 
( http://www.cpc.ncep.noaa.gov/products/analysis monitoring/ensostufiyensovears.shtml) was 
0.02 for 1983-1996 and -0.59 for 1997-2010, suggesting a possible contribution of the 
changes in ENSO-AO coupling to the prediction skill change of AO index. It requires further 
study to identify the mechanism for the higher prediction skill of AO from the dynamical 
seasonal prediction in recent period. 

Arribas et al. [2011] did not show significant prediction skill for NAO (which is 
analogous to AO), while in this study we found a much higher prediction skill of the AO. 
Arribas et al. [201 1] used a similar analysis period with this study but GloSea4 in this study 
used an improved version of the physical parameterizations, sea ice initialization and 
extended vertical resolution compared to the version used in Arribas et al. [2011]. This 
implies that sea ice initialization and a fully represented stratosphere may play an important 
role in the AO prediction skill. 

CFSv2 showed the highest AO prediction skill among the three sets of reforecasts. The 
better performance may be associated with the 9 hour coupled initialization in CFSR, which 
reduces the bias from each boundary, although further investigation is required to verify the 
benefit from the coupled initialization. The AO prediction skill from the multi-model 
ensemble (MME, r = 0.78 for 1997-2010) was comparable to the skill from CFSv2, which 
implies the MME was not adding much benefit in this case. 


The short time period over which the prediction skill was evaluated, makes it difficult to 
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assess any modulation of the AO from long-term variability such as the Pacific Decadal 


Oscillation (PDO). For example, the higher prediction skill of the NAO in recent decades has 
also been shown in previous studies [Rodwell and Folland, 2002; Bierkens and Beek, 2009]. 
This change in skill was also found in the AO from CFSv2 [Riddle et al., 2013]. Therefore, it 
is not possible to affirm that the level of skill found in this study will be same in the future. 
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345 Table 1 . Summary of the seasonal forecasting systems. Abbreviations and acronyms defined 

346 as follows: Met Office Unified Model (UM), Global Forecast System (GFS), Modular Ocean 

347 Model version 4 (MOM4), Nucleus for European Modeling of the Ocean (NEMO), Met 

348 Office Surface Exchange Scheme (MOSES), GEOS-integrated Ocean Data Assimilation 

349 System (GEOS-iODAS [Vernieres et al, 2012]), Climate Forecast System Reanalysis (CFSR 


350 [Saha et al., 2010]) 


GloSea4 

CFSv2 

GEOS-5 

Reforecast period 

1996-2009 

1981-2010 

1981-2012 

Model 
(atmosphere, 
ocean, land, and 
sea ice) 

UM version 7.6, 
NEMO 3.0, MOSES, 
and CICE 4.1 

GFS, MOM4, Noah 
land model, and 3- 
layer sea ice model 

GEOS-5, MOM4, 
Catchment Land 
Surface Model [Koster 
et al. 2000], and CICE 
4.0 

Florizontal 

N96L85 (145x196) 

T126L64 (181x360) 

l°xl.25° (181x288) 

resolution 




Vertical levels 

85 levels 

64 levels 

72 levels 


ERA -Interim 

CFSR (9h full-coupled 

MERRA (atmosphere- 


(atmosphere-land) 

initialization) 

land) and GEOS- 


Initial condition and NEMO-CICE iODAS (ocean-sea ice) 


data assimilation 
(ocean-sea ice) 



3-member on fixed 

4-member on every 5 

1 -member on every 5 


calendar dates (the 

days beginning with 

days with additional 

Number of 

1st, 9th, 17th and 

Jan 1st of each year 

members for the 

ensemble 

members 
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353 Table 2. Correlation coefficients between DJF-mean AO index from MERRA and each 


354 forecast. Single and double asterisk indicates that the correlation coefficient is statistically 

355 significant at the 95% and 99% confidence level, respectively. 



1983-1996 

1997-2010 

1983-2010 

GloSea4 

n/a 

0.42 

n/a 

CFSv2 

0.46 

0.87** 

0.66** 

GEOS-5 

0.33 

0.57* 

0.43* 

Persistent 

-0.23 

0.23 

-0.25 
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Figure 1. DJF mean sea level pressure anomaly regressed onto leading PC for 1997-2010 for 
(a) MERRA, (b) GloSea4, (c) CFSv2, and (d) GEOS-5 (unit is hPa). Contour lines refer 
absolute value equal to 3 hPa. Percentages indicate explained variance (averaged explained 
variance from each ensemble member) from the pattern. 

Figure 2. DJF mean surface temperature anomaly (1 st row, unit is K), zonal wind at 200 hPa 
anomaly (2 nd row, unit is m/s), and normalized precipitation (3 rd row, unitless) regressed onto 
AO index of each forecast for 1997-2010. Precipitation anomalies are normalized by 
monthly mean precipitation of each grid point. The dotted grids indicate statistically 
significant more than 90% confidence levels. 

Figure 3. (a) DJF mean normalized AO index of MERRA (black solid line), GloSea4 (red 
bars), CFSv2 (blue bars), GEOS-5 (orange bars). The error bars refer ensemble spread of AO 
index between first quarter and third quarter. Correlation coefficient of AO index as a 
function of forecast lead month for (b) GloSea4, (c) CFSv2, and (d) GEOS-5. Black dashed 
line refers persistent forecast by MERRA November AO index for 1979-2012, and colored 
lines indicate prediction skill for each period. Thin horizontal dashed line refers 90% 
confidence level for 14 years. 

Figure 4. Sum of Relative Operating Characteristic (ROC) scores for ensemble AO index 
prediction for upper tercile (red) and lower tercile (blue). The checkered bars indicate ROC 
scores for 1983-1996, and the filled bars indicate ROC scores for 1997-2010. 


(a) MERRA 40.8% (b) GloSea4 38.5% 
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(c) CFSv2 30.5% (d) G EOS-5 36.7% 
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379 Figure 1 . DJF mean sea level pressure anomaly regressed onto leading PC for 1997-2010 for 

380 (a) MERRA, (b) GloSea4, (c) CFSv2, and (d) GEOS-5 (unit is hPa). Contour lines refer 

381 absolute value equal to 3 hPa. Percentages indicate explained variance (averaged explained 

382 variance from each ensemble member) from the pattern. 
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386 Figure 2. DJF mean surface temperature anomaly (1 st row, unit is K), zonal wind at 200 hPa 

387 anomaly (2 nd row, unit is m/s), and normalized precipitation (3 rd row, unitless) regressed onto 

388 AO index of each forecast for 1997-2010. Precipitation anomalies are normalized by 

389 monthly mean precipitation of each grid point. The dotted grids indicate statistically 

390 significant more than 90% confidence levels. 
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Figure 3. (a) DJF mean normalized AO index of MERRA (black solid line), GloSea4 (red 
bars), CFSv2 (blue bars), GEOS-5 (orange bars). The error bars refer ensemble spread of AO 
index between first quarter and third quarter. Correlation coefficient of AO index as a 
function of forecast lead month for (b) GloSea4, (c) CFSv2, and (d) GEOS-5. Black dashed 
line refers persistent forecast by MERRA November AO index for 1979-2012, and colored 
lines indicate prediction skill for each period. Thin horizontal dashed line refers 90% 


confidence level for 14 years. 
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Figure 4. Sum of Relative Operating Characteristic (ROC) scores for ensemble AO index 
prediction for upper tercile (red) and lower tercile (blue). The checkered bars indicate ROC 
scores for 1983-1996, and the filled bars indicate ROC scores for 1997-2010. 
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